Application of a simple likelihood ratio approximant to protein sequence classification

نویسندگان

  • László Kaján
  • Attila Kertész-Farkas
  • Dino Franklin
  • Neli Ivanova
  • András Kocsor
  • Sándor Pongor
چکیده

MOTIVATION Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences. RESULTS We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith-Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION

This paper considers the generation of some interpretable fuzzy rules for assigning an amino acid sequence into the appropriate protein superfamily. Since the main objective of this classifier is the interpretability of rules, we have used the distribution of amino acids in the sequences of proteins as features. These features are the occurrence probabilities of six exchange groups in the seque...

متن کامل

The modified recombinant proinsulin: a simple and efficient route to produce insulin glargine in E. coli

Background: Recombinant insulin glargine, a long-acting analogue of insulin, is expressed as proinsulin in host cell and after purification and refolding steps cleaved to active insulin by enzymatic digestion using trypsin and carboxypeptidase B. Since the proinsulin's B and C chains have several internal arginine and lysine residues, a number of impurities are generated following treatment wit...

متن کامل

Ascitic fluid to serum bilirubin ratio for differentiation of exudates from transudates

  Abstract   Background: Regarding the diagnostic errors of the classic criteria including serum ascites   albumin gradient (SAAG), total protein concentration and the adapted Light et al’s criteria   in distinguishing transudate versus exudates, we evaluated the ascitic fluid to serum   bilirubin ratio as a new criteria in this regard. We also evaluated whether the combination of   bilirubin r...

متن کامل

Total Electricity Demand Modeling: An Application of Spatial Panel Econometric Method

This paper aims to model total electricity demand (incremental) in order to estimate price and income elasticities using provincial data and the spatial panel data method. Electricity demand at the province level is influenced by climatic zones, which can be divided into temperate, cold and sub-tropical. This paper uses time series data for electricity demand in Iran’s 28 provinces, taking into...

متن کامل

Optimization of the Analysis of Almond DNA Simple Sequence Repeats (SSRs) Through Submarine Electrophoresis Using Different Agaroses and Staining Protocols

Simple sequence repeat (SSR markers or microsatellites), based on the specific PCR amplification of DNA sequences, are becoming the markers of choice for molecular characterization of a wide range of plants because of their high polymorphism, abundance, and codominant inheritance. Different methods have been used for the analysis of the SSR amplified fragments being submarine agarose electropho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 22 23  شماره 

صفحات  -

تاریخ انتشار 2006